{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Semi-supervised Community Detection using Graph Convolutional Network\n",
    "=====================\n",
    "\n",
    "Predicting community memberships of a network of entities is a common task in many real-world scenarios\n",
    "working with graph data. In this tutorial, we demonstrate how to implement a Graph Convolutional Network (GCN)\n",
    "[Kipf & Welling](https://arxiv.org/abs/1609.02907) using DGL to solve one such community detection problem in\n",
    "a semi-supervised setting."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "More specifically, you will learn:\n",
    "- How to load graph data to DGLGraph\n",
    "- How to manipulate node/edge features on the graph\n",
    "- How to write a Graph Convolutional Layer using message passing APIs\n",
    "- Train the model and visualize the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# A bit of setup, just ignore this cell\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# for auto-reloading external modules\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "%matplotlib inline\n",
    "plt.rcParams['figure.figsize'] = (8.0, 6.0) # set default size of plots\n",
    "plt.rcParams['image.interpolation'] = 'nearest'\n",
    "plt.rcParams['image.cmap'] = 'gray'\n",
    "plt.rcParams['animation.html'] = 'html5'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Zachery's Karate Club\n",
    "---\n",
    "We start by creating the well-known *\"Zachary's karate club\"* social network. The network captures 34 members of a karate club, documenting pairwise links between members who interacted outside the club. The club later splits into two communities led by the instructor (node 0) and club president (node 33). You could read more about the story in the [wiki page](https://en.wikipedia.org/wiki/Zachary%27s_karate_club) A visualization of the network and the community is as follows:\n",
    "\n",
    "![karate](https://www.dropbox.com/s/uqzor4lqsmbnz8k/karate1.jpg?dl=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Load graph data\n",
    "---\n",
    "Let's see how we can load such graph to DGL. We start with importing `dgl` and other relevant packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import dgl\n",
    "# Load MXNet as backend\n",
    "dgl.load_backend('mxnet')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "DGL is platform-agnostic and can support multiple popular tensor-based frameworks such as [PyTorch](https://pytorch.org) and [MXNet](https://mxnet.apache.org/). In this session, we use MXNet/Gluon backend and provide equivalent Pytorch-based implementation in the comments."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "To create a graph in dgl, use `g = dgl.DGLGraph(graph_data)`. We support a wide range of `graph_data`. Here are some examples:\n",
    "\n",
    "* An edge list (e.g. `[(0, 1), (1, 2), ...]`)\n",
    "* A [`networkx`](https://networkx.github.io/) graph object.\n",
    "* A scipy sparse matrix.\n",
    "\n",
    "Since `networkx` already provides an API to create a karate club network, we can create a DGLGraph from it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import networkx as nx\n",
    "\n",
    "G = dgl.DGLGraph(nx.karate_club_graph())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "And you are encouraged to read [our documentations](https://docs.dgl.ai/api/python/graph.html#adding-nodes-and-edges) to also learn about APIs that manipulate graph structures."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Let's print out how many nodes and edges are there in this graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "print('#Nodes', G.number_of_nodes())\n",
    "print('#Edges', G.number_of_edges())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "We can also perform queries on the graph structures."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Get the in-degree of node 0:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "G.in_degree(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Get the predecessors of node 0:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "G.predecessors(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "DGLGraph can be converted to `networkx` very easily. For example, we can utilize `networkx` to visualize the graph:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "nx_G = G.to_networkx()\n",
    "pos = nx.circular_layout(nx_G)\n",
    "nx.draw(nx_G, pos, with_labels=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Manipulating node/edge data\n",
    "---\n",
    "\n",
    "Nodes and edges in `DGLGraph` can have feature tensors. Features of multiple nodes/edges are batched on the first dimension. Let's start by assigning a random feature vector of length 5 to all nodes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import mxnet as mx\n",
    "import mxnet.ndarray as nd\n",
    "\n",
    "G.ndata['feat'] = nd.random.randn(34, 5)\n",
    "\n",
    "# >>> for torch users\n",
    "# G.ndata['feat'] = torch.randn((34, 5))\n",
    "# <<<"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Now each node has a feature vector `'feat'` that has 5 elements. Note since there are 34 nodes in this graph, the first dimension must be of size 34, so that each row corresponds to the feature vector of each node. Error will be raised if the dimension mismatches:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "# This will raise error!!\n",
    "# G.ndata['wrong_feat'] = nd.random.randn(35, 5)\n",
    "\n",
    "# >>> for torch users\n",
    "# G.ndata['wrong_feat'] = torch.randn((35, 5))\n",
    "# <<<"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The `G.ndata` is a dictionary-like structure, so it is compatible with any operation on dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "# Use `dict.update` to add new features (vector of length 3)\n",
    "G.ndata.update({'another_feat' : nd.random.randn(34, 3)})\n",
    "# >>> for torch users\n",
    "# G.ndata.update({'another_feat' : torch.randn((34, 3))})\n",
    "# <<<\n",
    "\n",
    "# Print the feature dictionary\n",
    "print(G.ndata)\n",
    "\n",
    "# Delete the new feature\n",
    "del G.ndata['another_feat']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Similar to `G.ndata` and `G.nodes`, we have `G.edata` and `G.edges` to access and modify edge features:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "# The broness edge feature is just a scalar.\n",
    "G.edata['broness'] = nd.ones((G.number_of_edges(),))\n",
    "# >>> for torch users\n",
    "# G.edata['broness'] = torch.ones((G.number_of_edges(),))\n",
    "# <<<"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "You can also use `G.edges[src, dst]` to read/write features of a subset of edges."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "# The instructor (node 0) is a tough guy, so his friends are a little bit scared of him.\n",
    "G.edges[G.predecessors(0), 0].data['broness'] *= 0.5\n",
    "\n",
    "print(G.edata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Define a GCN layer using message passing paradigm\n",
    "---\n",
    "\n",
    "Graph convolutional network (GCN) is a popular model proposed by [Kipf & Welling](https://arxiv.org/abs/1609.02907) to encode graph structure. The model consists of several layers, each perform convolution-like operation defined on graph:\n",
    "$$\n",
    "Y=\\hat{A}XW\n",
    "$$\n",
    "\n",
    ", where $X$ is the node embedding tensor (stacked along the first dimension), $W$ is a projection matrix (the weight parameter) and $\\hat{A}$ is the normalized adjacency matrix:\n",
    "$$\n",
    "\\hat{A} = D^{-\\frac{1}{2}}AD^{-\\frac{1}{2}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The equations above involve a matrix multiplication between the normalized adjacency matrix and node features. And this can be expressed in terms of **message passing paradigm**:\n",
    "* message phase: all nodes first compute and send out messages along out-going edges.\n",
    "* reduce phase: all node then collect in-coming messages, aggregate them and update their own embedding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import mxnet.gluon as gluon\n",
    "import mxnet.gluon.nn as nn\n",
    "\n",
    "# Define the GraphConv module\n",
    "class GraphConv(gluon.Block):\n",
    "    def __init__(self, in_feats, out_feats):\n",
    "        super(GraphConv, self).__init__()\n",
    "        self.linear = nn.Dense(out_feats)\n",
    "    \n",
    "    def forward(self, g, inputs):\n",
    "        # g is the graph and the inputs is the input node features\n",
    "        # first perform linear transformation\n",
    "        h = self.linear(inputs)\n",
    "        # set the node features\n",
    "        g.ndata['h'] = h\n",
    "        # trigger message passing, using the message_func and reduce_func.\n",
    "        g.update_all(message_func, reduce_func)\n",
    "        # get the result node features\n",
    "        return g.ndata.pop('h')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# >>> For torch users\n",
    "#\n",
    "# import torch.nn as nn\n",
    "# import torch.nn.functional as F\n",
    "#\n",
    "# # Define the GraphConv module\n",
    "# class GraphConv(nn.Module):\n",
    "#     def __init__(self, in_feats, out_feats):\n",
    "#         super(GraphConv, self).__init__()\n",
    "#         self.linear = nn.Linear(in_feats, out_feats)\n",
    "#    \n",
    "#     def forward(self, g, inputs):\n",
    "#         # g is the graph and the inputs is the input node features\n",
    "#         # first perform linear transformation\n",
    "#         h = self.linear(inputs)\n",
    "#         # set the node features\n",
    "#         g.ndata['h'] = h\n",
    "#         # trigger message passing, using the defined message_func and reduce_func.\n",
    "#         g.update_all(message_func, reduce_func)\n",
    "#         # get the result node features\n",
    "#         h = g.ndata.pop('h')\n",
    "#         return h\n",
    "# <<<"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Now let's see how we define `message_func` and `reduce_func` in DGL:\n",
    "\n",
    "Suppose the current embedding of node $i$ after the linear transformation (i.e. multiplying weight matrix $W$) is $h_i$. From the equation of GCN above, each node sends out the embedding after linear transformation to their neighbors. Then the message from node $j$ to node $i$ can be computed as\n",
    "$$m_{j\\rightarrow i} = h_j$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "So the message function takes out node feature `h` as the message, and can be defined using DGL's built-in functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import dgl.function as fn\n",
    "message_func = fn.copy_u('h', 'm')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Each node aggregates received messages by summation. And for simplicity, we first ignore the normalization. So the aggregated messages on node $i$ can be computed as\n",
    "$$\\tilde{h_i} = \\sum\\limits_{j\\in \\mathcal{N}(i)}m_{j\\rightarrow i}$$, where $\\mathcal{N}(j)$ is the neighbor set of node $i$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reduce_func = fn.sum('m', 'h')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "We then define a two-layer Graph Convolutional Network using the above module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "# Define a 2-layer GCN model\n",
    "class GCN(gluon.Block):\n",
    "    def __init__(self, in_feats, hidden_size, num_classes):\n",
    "        super(GCN, self).__init__()\n",
    "        self.conv1 = GraphConv(in_feats, hidden_size)\n",
    "        self.conv2 = GraphConv(hidden_size, num_classes)\n",
    "    \n",
    "    def forward(self, g, inputs):\n",
    "        h = self.conv1(g, inputs)\n",
    "        h = nd.relu(h)\n",
    "        h = self.conv2(g, h)\n",
    "        return h"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# >>> For torch users\n",
    "# \n",
    "# class GCN(nn.Module):\n",
    "#     def __init__(self, in_feats, hidden_size, num_classes):\n",
    "#         super(GCN, self).__init__()\n",
    "#         self.gcn1 = GraphConv(in_feats, hidden_size)\n",
    "#         self.gcn2 = GraphConv(hidden_size, num_classes)\n",
    "#     \n",
    "#     def forward(self, g, inputs):\n",
    "#         h = self.gcn1(g, inputs)\n",
    "#         h = torch.relu(h)\n",
    "#         h = self.gcn2(g, h)\n",
    "#         return h\n",
    "# <<<"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Now let's train this model to predict the club membership after the split. To train the model, we adopt Kipf's semi-supervised setting:\n",
    "* Only the instructor node (node 0) and the president node (node 33) are labeled.\n",
    "* The initial node feature is a one-hot encoding of the node id."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "inputs = nd.eye(34)  # featureless inputs\n",
    "labeled_nodes = nd.array([0, 33])  # only the instructor and the president nodes are labeled\n",
    "labels = nd.array([0, 1])  # their labels are different"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "from mxnet import autograd\n",
    "net = GCN(34, 5, 2)\n",
    "net.initialize()\n",
    "trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': 0.01})\n",
    "loss_fn = gluon.loss.SoftmaxCELoss()\n",
    "\n",
    "all_logits = []\n",
    "for epoch in range(30):\n",
    "    with autograd.record():\n",
    "        logits = net(G, inputs)\n",
    "        # we only compute loss for node 0 and node 33\n",
    "        loss = loss_fn(logits[labeled_nodes], labels).sum()\n",
    "    all_logits.append(logits.detach())\n",
    "    \n",
    "    loss.backward()\n",
    "    trainer.step(batch_size=1)\n",
    "    \n",
    "    print('Epoch %d | Loss: %.4f' % (epoch, loss.asscalar()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# >>> For torch users\n",
    "# inputs = torch.eye(34)  # featureless inputs\n",
    "# labeled_nodes = torch.tensor([0, 33])  # only the instructor and the president nodes are labeled\n",
    "# labels = torch.tensor([0, 1])  # their labels are different\n",
    "# net = GCN(34, 5, 2)\n",
    "# optimizer = torch.optim.Adam(net.parameters(), lr=0.001)\n",
    "# \n",
    "# all_logits = []\n",
    "# for epoch in range(30):\n",
    "#     logits = net(G, inputs)\n",
    "#     all_logits.append(logits.detach())\n",
    "#     logp = F.log_softmax(logits, 1)\n",
    "#     # we only compute loss for node 0 and node 33\n",
    "#     loss = F.nll_loss(logp[labeled_nodes], labels)\n",
    "# \n",
    "#     optimizer.zero_grad()\n",
    "#     loss.backward()\n",
    "#     optimizer.step()\n",
    "# \n",
    "#     print('Epoch %d | Loss: %.4f' % (epoch, loss.item()))\n",
    "# <<<"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Now let's visualize the results. Since the final node embedding is a vector of length two (for predicting two classes), we can plot it as a point on a 2D plot and visualize how the final embeddings cluster towards each other."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "# Visualize the node classification using the logits output.\n",
    "import numpy as np\n",
    "import matplotlib.animation as animation\n",
    "from IPython.display import HTML\n",
    "\n",
    "fig = plt.figure(dpi=150)\n",
    "fig.clf()\n",
    "ax = fig.subplots()\n",
    "nx_G = G.to_networkx()\n",
    "def draw(i):\n",
    "    cls1color = '#00FFFF'\n",
    "    cls2color = '#FF00FF'\n",
    "    pos = {}\n",
    "    colors = []\n",
    "    for v in range(34):\n",
    "        pos[v] = all_logits[i][v].asnumpy()\n",
    "        cls = np.argmax(pos[v])\n",
    "        colors.append(cls1color if cls else cls2color)\n",
    "    ax.cla()\n",
    "    ax.axis('off')\n",
    "    ax.set_title('Epoch: %d' % i)\n",
    "    nx.draw(nx_G.to_undirected(), pos, node_color=colors, with_labels=True, node_size=500)\n",
    "\n",
    "ani = animation.FuncAnimation(fig, draw, frames=len(all_logits), interval=200)\n",
    "HTML(ani.to_html5_video())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Advanced: writing arbitrary message and reduce function \n",
    "---\n",
    "\n",
    "DGL provides many message and reduce functions to express different GNN variants. For instance, we support `u_mul_e` as message function and `max` as reduce function. A full list of built-in message and reduce functions can be found [here](https://docs.dgl.ai/features/builtin.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "However, there is always a chance to go beyond this list. In DGL, you can define arbitrary message and reduce function in python function. Here is how you can define the same computation as the built-in `copy_u` and `sum`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "# Same message function as `fn.copy_u`\n",
    "def gcn_message_udf(edges):\n",
    "    return {'m' : edges.src['h']}\n",
    "\n",
    "# Same reduce function as `fn.sum`\n",
    "def gcn_reduce_udf(nodes):\n",
    "    return {'h' : nodes.mailbox['m'].sum(dim=1)}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "However, using DGL's built-in functions is much faster because we can map them to efficient CPU/GPU kernels while user-defined python functions can only be invoked in python side."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Exercise: implementing the normalization\n",
    "---\n",
    "The GCN model we have implemented above uses the unnormalized adjacency matrix $A$. In this exercise, you will implement the normalization. Based on the equation:\n",
    "$$\n",
    "Y = D^{-\\frac{1}{2}}AD^{-\\frac{1}{2}}XW\n",
    "$$\n",
    ", the message and reduce function will be rewritten as follows:\n",
    "$$\n",
    "\\text{message phase: }m_{j\\rightarrow i}=\\frac{1}{\\sqrt{d_j}}h_j\n",
    "$$\n",
    "$$\n",
    "\\text{reduce phase: }\\tilde{h}_i=\\frac{1}{\\sqrt{d_i}}\\sum_{j\\in\\mathcal{N}(i)}m_{j\\rightarrow i}\n",
    "$$\n",
    "\n",
    "Now your task here is to implement this normalization.\n",
    "\n",
    "Hint 1: you can use `G.in_degrees(G.nodes())` to get a 1-D tensor containing the degrees of all the nodes.\n",
    "\n",
    "Hint 2: you can try UDF message function to perform multiplication between source node feature and edge feature. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
